Subspace modeling and selection for noisy speech recognition
نویسندگان
چکیده
This paper presents a new subspace modeling and selection approach for noisy speech recognition. In subspace modeling, we develop factor analysis (FA) for representing noisy speech. FA is a data generation model where the common factors are extracted with factor loading matrix and specific factors. We bridge the connection of FA to signal subspace (SS) approach. Interestingly, FA partitions noisy speech space into a principal subspace containing speech and noise and a minor subspace containing residual speech and residual noise. To estimate clean speech, we minimize the energies of speech distortion in principal subspace as well as minor subspace. More importantly, in subspace selection, we explore optimal subspace partition via solving hypothesis test problems. We test the equivalence of eigenvalues in minor subspace so as to determine subspace dimension. To fulfill FA spirit, we further examine the hypothesis of uncorrelated residual speech. Optimal solutions are realized through likelihood ratio test with the approximated chi-square distributions as test statistics. Subspace partition is performed according to the confidence towards rejecting null hypotheses. In the experiments on Aurora2 database, FA outperforms SS in subspace modeling. New selection algorithms effectively determine subspace dimension for noisy speech recognition.
منابع مشابه
An Information-Theoretic Discussion of Convolutional Bottleneck Features for Robust Speech Recognition
Convolutional Neural Networks (CNNs) have been shown their performance in speech recognition systems for extracting features, and also acoustic modeling. In addition, CNNs have been used for robust speech recognition and competitive results have been reported. Convolutive Bottleneck Network (CBN) is a kind of CNNs which has a bottleneck layer among its fully connected layers. The bottleneck fea...
متن کاملSpeech Emotion Recognition Based on Power Normalized Cepstral Coefficients in Noisy Conditions
Automatic recognition of speech emotional states in noisy conditions has become an important research topic in the emotional speech recognition area, in recent years. This paper considers the recognition of emotional states via speech in real environments. For this task, we employ the power normalized cepstral coefficients (PNCC) in a speech emotion recognition system. We investigate its perfor...
متن کاملSpeech Enhancement Through an Optimized Subspace Division Technique
The speech enhancement techniques are often employed to improve the quality and intelligibility of the noisy speech signals. This paper discusses a novel technique for speech enhancement which is based on Singular Value Decomposition. This implementation utilizes a Genetic Algorithm based optimization method for reducing the effects of environmental noises from the singular vectors as well as t...
متن کاملSingle channel speech enhancement using principal component analysis and MDL subspace selection
We present in this paper a novel subspace approach for single channel speech enhancement and speech recognition in highly noisy environments. Our algorithm is based on principal component analysis and the optimal subspace selection is provided by a minimum description length criterion. This choice overcomes the limitations encountered with other selection criteria, like the overestimation of th...
متن کاملSpeech Enhancement Through an Optimized Subspace Division Technique
The speech enhancement techniques are often employed to improve the quality and intelligibility of the noisy speech signals. This paper discusses a novel technique for speech enhancement which is based on Singular Value Decomposition. This implementation utilizes a Genetic Algorithm based optimization method for reducing the effects of environmental noises from the singular vectors as well as t...
متن کامل